Speaker adaptation for audio-visual speech recognition

نویسندگان

  • Gerasimos Potamianos
  • Alexandros Potamianos
چکیده

In this paper, speaker adaptation is investigated for audiovisual automatic speech recognition (ASR) using the multistream hidden Markov model (HMM). First, audio-only and visual-only HMM parameters are adapted by combining maximum a posteriori and maximum likelihood linear regression adaptation. Subsequently, the audio-visual HMM stream exponents are adapted to better capture the reliability of each modality for the speci c speaker, by means of discriminative training. Various visual feature sets are compared, and features based on linear discriminant analysis are demonstrated to result in superior multispeaker and speaker-adapted recognition performance. In addition, visual feature mean normalization is shown to signi cantly improve visual-only and audio-visual ASR performance. Adaptation experiments on a 49-subject database are reported. On average, a 28% relative word error reduction is achieved by adapting the multi-speaker audiovisual HMM to each subject in the database.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Automatic speechreading of impaired speech

We investigate the use of visual, mouth-region information in improving automatic speech recognition (ASR) of the speech impaired. Given the video of an utterance by such a subject, we first extract appearance-based visual features from the mouth region-of-interest, and we use a feature fusion method to combine them with the subject’s audio features into bimodal observations. Subsequently, we a...

متن کامل

Improving Lip-reading with Feature Spac Audio-Visual Speech R

In this paper we investigate feature space transforms to improve lip-reading performance for multi-stream HMM based audio-visual speech recognition (AVSR). The feature space transforms include non-linear Gaussianization transform and feature space maximum likelihood linear regression (fMLLR). We apply Gaussianization at the various stages of visual front-end. The results show that Gaussianizing...

متن کامل

Fused HMM-adaptation of multi-stream HMMs for audio-visual speech recognition

A technique known as fused hidden Markov models (FHMMs) was recently proposed as an alternative multi-stream modelling technique for audio-visual speaker recognition. In this paper we show that for audio-visual speech recognition (AVSR), FHMMs can be adopted as a novel method of training synchronous MSHMMs. MSHMMs, as proposed by several authors for use in AVSR, are jointly trained on both the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999